Ordinary Least Squares

Linear models aim to predict a target value as a linear combination of input features. Commonly represented with:

\hat{y} = w_0 + w_1 x_1 + \dots + w_p x_p

where $\hat{y}$ is the predicted value, $w_0$ is the intercept, and $w_1, \dots, w_p$ are the coefficients.

Ordinary Least Squares

Goal: Minimize the residual sum of squares between observed targets and predictions.
Method: Fits a linear model to minimize the cost function $\|y - Xw\|^2$ .
Complexity: Computed using singular value decomposition; $O (n^2p)$ if $n > p$ .

from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
print(reg.coef_)

Non-Negative Least Squares

Constrain coefficients to be non-negative. Useful for variables representing quantities like prices.

Complexity of OLS

Computed using SVD, cost depends on matrix dimensions.

Ridge Regression

Goal: Address multicollinearity by adding a penalty on the size of coefficients.
Method: Minimizes $\|y - Xw\|^2 + \alpha \|w\|^2$ where $\alpha$ is a complexity parameter.
Solver Choice: Automatically chosen based on conditions (e.g., data sparsity).

from sklearn import linear_model
reg = linear_model.Ridge(alpha=.5)
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
print(reg.coef_)
print(reg.intercept_)

Ridge Classification

Treats binary classification by converting targets and applying the ridge regression formula.

Lasso

Goal: Estimate sparse coefficients, effectively reducing the number of features.
Method: Minimizes $\|y - Xw\|^2 + \alpha \|w\|_1$ .
Use: Important in the field of compressed sensing.
Feature Selection: Can be used to select features due to sparsity of solution.

from sklearn import linear_model
reg = linear_model.Lasso(alpha=0.1)
reg.fit([[0, 0], [1, 1]], [0, 1])
print(reg.predict([[1, 1]]))

Multi-task Lasso

Goal: Estimate sparse coefficients for multiple regression problems jointly.
Specificity: Same features selected across all regression tasks.

Elastic-Net

Goal: Combine penalties of Ridge and Lasso, useful when multiple features are correlated.
Method: Minimizes $\|y - Xw\|^2 + \alpha \rho \|w\|_1 + \frac{\alpha (1 - \rho)}{2} \|w\|^2$ .

Multi-task Elastic-Net

Goal: Similar to Multi-task Lasso but with Elastic-Net penalty.
Use: Applicable when tasks share the same sparse features.

Least Angle Regression (LARS)

Goal: Efficiently compute full path of coefficients.
Method: Similar to forward stepwise regression, adjusts direction to stay equiangular to all variables most correlated with residual.

Orthogonal Matching Pursuit (OMP)

Goal: Approximate the best fit under a constraint on the number of non-zero coefficients.
Method: Greedy algorithm selecting features most correlated with the residual.

Bayesian Regression

Goal: Incorporate regularization through prior distributions.
Types:
- Bayesian Ridge: Regularization parameter estimated from data.
- Automatic Relevance Determination (ARD): Similar to Bayesian Ridge but promotes sparsity.

from sklearn import linear_model
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
Y = [0., 1., 2., 3.]
reg = linear_model.BayesianRidge()
reg.fit(X, Y)
print(reg.predict([[1, 0.]]))
print(reg.coef_)

Logistic Regression

Goal: Linear model for classification (despite the name).
Method: Uses logistic function to model probability of default classes.
Regularization: L1, L2, and Elastic-Net available to penalize model complexity.

Generalized Linear Models (GLM)

Extension: Allows prediction using different distributions from the exponential family.
Functionality: Models relationship through a link function and extends the model family beyond normally distributed errors.

Quantile Regression

Goal: Estimate medians or other quantiles, rather than means.
Method: Minimizes the pinball loss, useful when predicting intervals.

Polynomial Regression

Goal: Extend linear models using polynomial basis functions.
Use: Can model non-linear relationships within a linear framework.

Polynomial Features Transformation

from sklearn.preprocessing import PolynomialFeatures
import numpy as np
X = np.arange(6).reshape(3, 2)
poly = PolynomialFeatures(degree=2)
print(poly.fit_transform(X))

Polynomial Regression Pipeline

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np
model = Pipeline([
    ('poly', PolynomialFeatures(degree=3)),
    ('linear', LinearRegression(fit_intercept=False))
])
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
model.fit(x[:, np.newaxis], y)
print(model.named_steps['linear'].coef_)

References and Useful Links

1.1. Linear Models — scikit-learn 1.4.2 documentation

Ordinary Least Squares​

Non-Negative Least Squares​

Complexity of OLS​

Ridge Regression​

Ridge Classification​

Lasso​

Multi-task Lasso​

Elastic-Net​

Multi-task Elastic-Net​

Least Angle Regression (LARS)​

Orthogonal Matching Pursuit (OMP)​

Bayesian Regression​

Logistic Regression​

Generalized Linear Models (GLM)​

Quantile Regression​

Polynomial Regression​

Polynomial Features Transformation​

Polynomial Regression Pipeline​

References and Useful Links​

Ordinary Least Squares

Non-Negative Least Squares

Complexity of OLS

Ridge Regression

Ridge Classification

Lasso

Multi-task Lasso

Elastic-Net

Multi-task Elastic-Net

Least Angle Regression (LARS)

Orthogonal Matching Pursuit (OMP)

Bayesian Regression

Logistic Regression

Generalized Linear Models (GLM)

Quantile Regression

Polynomial Regression

Polynomial Features Transformation

Polynomial Regression Pipeline

References and Useful Links